Monitoring tool usage in cataract surgery videos using boosted convolutional and recurrent neural networks
نویسندگان
چکیده
With an estimated 19 million operations performed annually, cataract surgery is the most common surgical procedure. This paper investigates the automatic monitoring of tool usage during a cataract surgery, with potential applications in report generation, surgical training and real-time decision support. In this study, tool usage is monitored in videos recorded through the surgical microscope. Following state-of-the-art video analysis solutions, each frame of the video is analyzed by convolutional neural networks (CNNs) whose outputs are fed to recurrent neural networks (RNNs) in order to take temporal relationships between events into account. Novelty lies in the way those CNNs and RNNs are trained. Computational complexity prevents the end-to-end training of “CNN+RNN” systems. Therefore, CNNs are usually trained first, independently from the RNNs. This approach is clearly suboptimal for surgical tool analysis: many tools are very similar to one another, but they can generally be differentiated based on past events. CNNs should be trained to extract the most useful visual features in combination with the temporal context. A novel boosting strategy is proposed to achieve this goal: the CNN and RNN parts of the system are simultaneously enriched by progressively adding weak classifiers (either CNNs or RNNs) trained to improve the overall classification accuracy. Experiments were performed in a new dataset of 50 cataract surgery videos where the usage of 21 surgical tools was manually annotated. Very good classification performance are achieved in this dataset: tool usage could be labeled with an average area under the ROC curve of Az = 0.9717 in offline mode (using past, present and future information) and Az = 0.9696 in online mode (using past and present information only).
منابع مشابه
Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملSpeech Emotion Recognition Using Scalogram Based Deep Structure
Speech Emotion Recognition (SER) is an important part of speech-based Human-Computer Interface (HCI) applications. Previous SER methods rely on the extraction of features and training an appropriate classifier. However, most of those features can be affected by emotionally irrelevant factors such as gender, speaking styles and environment. Here, an SER method has been proposed based on a concat...
متن کاملTool Detection and Operative Skill Assessment in Surgical Videos Using Region-Based Convolutional Neural Networks
Five billion people in the world lack access to quality surgical care. Surgeon skill varies dramatically, and many surgical patients suffer complications and avoidable harm. Improving surgical training and feedback will help to reduce the rate of complications—half of which have been shown to be preventable. To do this, it is essential to assess operative skills, a process that is currently man...
متن کاملA hybrid EEG-based emotion recognition approach using Wavelet Convolutional Neural Networks (WCNN) and support vector machine
Nowadays, deep learning and convolutional neural networks (CNNs) have become widespread tools in many biomedical engineering studies. CNN is an end-to-end tool which makes processing procedure integrated, but in some situations, this processing tool requires to be fused with machine learning methods to be more accurate. In this paper, a hybrid approach based on deep features extracted from Wave...
متن کاملDecision Support System for Age-Related Macular Degeneration Using Convolutional Neural Networks
Introduction: Age-related macular degeneration (AMD) is one of the major causes of visual loss among the elderly. It causes degeneration of cells in the macula. Early diagnosis can be helpful in preventing blindness. Drusen are the initial symptoms of AMD. Since drusen have a wide variety, locating them in screening images is difficult and time-consuming. An automated digital fundus photography...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.01559 شماره
صفحات -
تاریخ انتشار 2017